Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

manifest: improve Annotator interface with generics #3760

Merged

Conversation

anish-shanbhag
Copy link
Contributor

@anish-shanbhag anish-shanbhag commented Jul 15, 2024

Refactors manifest.Annotator to use generics and a simplified API. This eliminates the need to perform pointer manipulation and unsafe typecasting when defining a new Annotator. The goal of this change is to improve the Annotator interface while not changing any existing behavior.

BenchmarkNumFilesAnnotator shows roughly the same performance as master when compared
to the equivalent implementation of orderStatistic:

pkg: github.com/cockroachdb/pebble/internal/manifest
                     │     old     │             new              │
                     │   sec/op    │   sec/op     vs base         │
NumFilesAnnotator-10   1.635µ ± 1%   1.572µ ± 7%  ~ (p=0.065 n=6)

                     │    old     │              new              │
                     │    B/op    │    B/op     vs base           │
NumFilesAnnotator-10   536.0 ± 0%   536.0 ± 0%  ~ (p=1.000 n=6) ¹
¹ all samples are equal

                     │    old     │              new              │
                     │ allocs/op  │ allocs/op   vs base           │
NumFilesAnnotator-10   7.000 ± 0%   7.000 ± 0%  ~ (p=1.000 n=6) ¹
¹ all samples are equal

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@anish-shanbhag anish-shanbhag changed the title manifest: improve Annotator interface with generics manifest: improve Annotator interface with generics Jul 15, 2024
@anish-shanbhag anish-shanbhag marked this pull request as ready for review July 15, 2024 20:33
@anish-shanbhag anish-shanbhag requested a review from a team as a code owner July 15, 2024 20:33
Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great! Much cleaner!

Reviewed 5 of 11 files at r1.
Reviewable status: 5 of 11 files reviewed, 4 unresolved discussions (waiting on @anish-shanbhag and @itsbilal)


-- commits line 21 at r1:
do you know where this additional allocation is coming from? A heap profile go test -run XXX -bench BenchmarkNumFilesAnnotator -memprofile mem.prof should help


compaction_picker.go line 1390 at r1 (raw file):

			// TODO(travers): Consider an alternative heuristic for elision of range-keys.
			deletionCriteria := f.Stats.RangeDeletionsBytesEstimate*10 >= f.Size || f.Stats.NumDeletions*10 > f.Stats.NumEntries
			return !f.IsCompacting() && f.StatsValid() && deletionCriteria, f.StatsValid()

nit: I think the previous structure with an early exit if !f.StatsValid() was a little clearer—the stats we use in computing deletionCriteria might not be initialized at all, and it's less obvious that we're not improperly using them


internal/manifest/annotator.go line 75 at r1 (raw file):

	n.annot = append(n.annot, annotation{
		annotator: a,
		vptr:      new(T),

The annotations where T = *fileMetadata end up double boxing (storing a **fileMetadata in vptr). Maybe the responsibility of boxing the value should be the responsibility of whoever defines T (eg, T should always be a pointer type to avoid an allocation). Then the PickFileAnnotator continues to use T = *fileMetadata, boxing once.


internal/manifest/annotator.go line 234 at r1 (raw file):

		return f1
	}
	return f2

nit: worth using a switch:

switch {
case f1 == nil:
    return f2
case f2 == nil:
    return f1
case fa.Compare(f1, f2):
    return f1
default:
    return f2
}

Copy link
Contributor Author

@anish-shanbhag anish-shanbhag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated benchmark results:

goos: darwin
goarch: arm64
pkg: github.com/cockroachdb/pebble/internal/manifest
                     │     old     │             new              │
                     │   sec/op    │   sec/op     vs base         │
NumFilesAnnotator-10   1.635µ ± 1%   1.572µ ± 7%  ~ (p=0.065 n=6)

                     │    old     │              new              │
                     │    B/op    │    B/op     vs base           │
NumFilesAnnotator-10   536.0 ± 0%   536.0 ± 0%  ~ (p=1.000 n=6) ¹
¹ all samples are equal

                     │    old     │              new              │
                     │ allocs/op  │ allocs/op   vs base           │
NumFilesAnnotator-10   7.000 ± 0%   7.000 ± 0%  ~ (p=1.000 n=6) ¹
¹ all samples are equal

Reviewable status: 3 of 11 files reviewed, 4 unresolved discussions (waiting on @itsbilal and @jbowens)


-- commits line 21 at r1:

Previously, jbowens (Jackson Owens) wrote…

do you know where this additional allocation is coming from? A heap profile go test -run XXX -bench BenchmarkNumFilesAnnotator -memprofile mem.prof should help

Spent a while trying to find the allocation in the Annotator, but turns out this was actually because require.EqualValues allocated a new uint64 when two different types were passed for comparison (in this case int and uint64. This is fixed now; I also added the updated benchmark results above.


compaction_picker.go line 1390 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

nit: I think the previous structure with an early exit if !f.StatsValid() was a little clearer—the stats we use in computing deletionCriteria might not be initialized at all, and it's less obvious that we're not improperly using them

Makes sense, I've reverted this to more closely match with the structure we had before.


internal/manifest/annotator.go line 75 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

The annotations where T = *fileMetadata end up double boxing (storing a **fileMetadata in vptr). Maybe the responsibility of boxing the value should be the responsibility of whoever defines T (eg, T should always be a pointer type to avoid an allocation). Then the PickFileAnnotator continues to use T = *fileMetadata, boxing once.

Agreed that this is not ideal. That said, I'm curious what the potential downsides are (performance, edge cases, etc.) for the double boxing here. One of my goals was to abstract away the need for pointer manipulation entirely, making annotators like SumAnnotator more straightforward to implement. If we place the responsibility of boxing on whoever defines T, then they also have implement the initial allocation in Zero and merging into an existing value in Accumulate (i.e. very similar to the previous implementation, minus the typecasting).

Ideally there would be a way to distinguish when T is a pointer type and when it isn't, and avoid the double boxing based on that. But unfortunately, I haven't seen a way to do this with generics yet.


internal/manifest/annotator.go line 234 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

nit: worth using a switch:

switch {
case f1 == nil:
    return f2
case f2 == nil:
    return f1
case fa.Compare(f1, f2):
    return f1
default:
    return f2
}

Good call - fixed this

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 11 files reviewed, 2 unresolved discussions (waiting on @anish-shanbhag and @itsbilal)


-- commits line 21 at r1:

Previously, anish-shanbhag (Anish Shanbhag) wrote…

Spent a while trying to find the allocation in the Annotator, but turns out this was actually because require.EqualValues allocated a new uint64 when two different types were passed for comparison (in this case int and uint64. This is fixed now; I also added the updated benchmark results above.

Ah nice


internal/manifest/annotator.go line 75 at r1 (raw file):
I think the most significant downside is that double boxing forces a new allocation. Without the additional boxing, we would be storing *filetMetadata directly as the annotation value without suffering any additional allocations.

If we place the responsibility of boxing on whoever defines T, then they also have implement the initial allocation in Zero and merging into an existing value in Accumulate (i.e. very similar to the previous implementation, minus the typecasting).

IMO, this is worth it. In the case of the *fileMetadata annotations, there is no initial allocation. The zero value is just a nil pointer. In all the non-nil instances, we'll just be passing around pointers to the already-allocated fileMetadata.

At least the pointer handling will be encapsulated within the sumAggregator for those cases.

@anish-shanbhag anish-shanbhag force-pushed the annotator-generics branch 2 times, most recently from 3cb8d41 to 7777df4 Compare July 22, 2024 16:28
Copy link
Contributor Author

@anish-shanbhag anish-shanbhag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 11 files reviewed, 2 unresolved discussions (waiting on @itsbilal and @jbowens)


internal/manifest/annotator.go line 75 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

I think the most significant downside is that double boxing forces a new allocation. Without the additional boxing, we would be storing *filetMetadata directly as the annotation value without suffering any additional allocations.

If we place the responsibility of boxing on whoever defines T, then they also have implement the initial allocation in Zero and merging into an existing value in Accumulate (i.e. very similar to the previous implementation, minus the typecasting).

IMO, this is worth it. In the case of the *fileMetadata annotations, there is no initial allocation. The zero value is just a nil pointer. In all the non-nil instances, we'll just be passing around pointers to the already-allocated fileMetadata.

At least the pointer handling will be encapsulated within the sumAggregator for those cases.

Makes sense, I hadn't considered the fact that we avoid the extra allocation here. I've updated to avoid the double boxing.

Refactors `manifest.Annotator` to use generics and a simplified API.
This eliminates the need to perform pointer manipulation and unsafe
typecasting when defining a new Annotator. The goal of this change is to improve the `Annotator` interface while not changing any existing behavior.

`BenchmarkNumFilesAnnotator` shows roughly the same performance as master when compared
to the equivalent implementation of `orderStatistic`:
```
pkg: github.com/cockroachdb/pebble/internal/manifest
                     │     old     │             new              │
                     │   sec/op    │   sec/op     vs base         │
NumFilesAnnotator-10   1.635µ ± 1%   1.572µ ± 7%  ~ (p=0.065 n=6)

                     │    old     │              new              │
                     │    B/op    │    B/op     vs base           │
NumFilesAnnotator-10   536.0 ± 0%   536.0 ± 0%  ~ (p=1.000 n=6) ¹
¹ all samples are equal

                     │    old     │              new              │
                     │ allocs/op  │ allocs/op   vs base           │
NumFilesAnnotator-10   7.000 ± 0%   7.000 ± 0%  ~ (p=1.000 n=6) ¹
¹ all samples are equal
```
Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

nice, looks great!

Reviewed 2 of 5 files at r3, 2 of 2 files at r4.
Reviewable status: 6 of 11 files reviewed, 2 unresolved discussions (waiting on @anish-shanbhag and @itsbilal)

Copy link
Contributor Author

@anish-shanbhag anish-shanbhag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: 6 of 11 files reviewed, 2 unresolved discussions (waiting on @itsbilal and @jbowens)

@anish-shanbhag anish-shanbhag merged commit e36d078 into cockroachdb:master Jul 26, 2024
11 checks passed
anish-shanbhag added a commit to anish-shanbhag/pebble that referenced this pull request Jul 30, 2024
cockroachdb#3760 contained a bug which causes Annotator
values to be incorrectly aggregated when pointer values
should be overwritten. This is because the `vtyped`
variable was not being modified due to being on the
stack.

This change fixes this and adds a unit test for
`PickFileAggregator` to catch this issue in the future.

cockroachdb#3759 should already not be affected by
this due to the different way it handles aggregation.
anish-shanbhag added a commit to anish-shanbhag/pebble that referenced this pull request Jul 30, 2024
cockroachdb#3760 contained a bug which causes Annotator
values to be incorrectly aggregated when pointer values
should be overwritten. This is because the `vtyped`
variable was not being modified due to being on the
stack.

This change fixes this and adds a unit test for
`PickFileAggregator` to catch this issue in the future.

cockroachdb#3759 should already not be affected by
this due to the different way it handles aggregation.
anish-shanbhag added a commit that referenced this pull request Jul 31, 2024
#3760 contained a bug which causes Annotator
values to be incorrectly aggregated when pointer values
should be overwritten. This is because the `vtyped`
variable was not being modified due to being on the
stack.

This change fixes this and adds a unit test for
`PickFileAggregator` to catch this issue in the future.

#3759 should already not be affected by
this due to the different way it handles aggregation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants